Goto

Collaborating Authors

 category name



Supplementary Materials: AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

Neural Information Processing Systems

The series is directed by David Yates and distributed by Warner Bros. It consists of three fantasy films as of 2022: Fantastic Beasts and Where to Find Them (2016) [1]. The movie follows Newt Scamander, a magizoologist who travels to New York with a suitcase full of magical creatures. When some of the creatures escape, he teams up with a group of people to find them before they cause any harm.




1 Details for Dataset Partitioning Here we provide the dataset partitioning results for ImageNet [

Neural Information Processing Systems

Novel categories names:['High_Jump', 'Front_Crawl', 'Pole_V ault', 'Hammer_Throw', All experiments are conducted under the 16-shot setting. An incremental bayesian approach tested on 101 object categories. Conditional prompt learning for vision-language models.






Described Object Detection: Liberating Object Detection with Flexible Expressions

Neural Information Processing Systems

Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called *Described Object Detection* (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a *Description Detection Dataset* ($D^3$). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on $D^3$, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code are available at https://github.com/shikras/d-cube